---
slug: optimize-try-more
title: Further Optimizing TRY_CAST and TRY
authors: [mbasmanova]
tags: [tech-blog,expressions]
---

## TL;DR

Queries that use TRY or TRY_CAST may experience poor performance and high CPU
usage due to excessive exception throwing. We optimized CAST to indicate
failure without throwing and introduced a mechanism for scalar functions to do
the same. Microbenchmark measuring worst case performance of CAST improved
100x. Samples of production queries show 30x cpu time improvement.

## TRY and TRY(CAST)

TRY construct can be applied to any expression to suppress errors and turn them
into NULL results. TRY_CAST is a version of CAST that suppresses errors and
returns NULL instead.

For example, `parse_datetime('2024-05-', 'YYYY-MM-DD')` fails:

```
	 Invalid format: "2024-05-" is too short
```

, but `TRY(parse_datetime('2024-05-', 'YYYY-MM-DD'))` succeeds and returns NULL.

Similarly, `CAST('foo' AS INTEGER)` fails:

```
	Cannot cast 'foo' to INT
```

, but `TRY_CAST('foo' AS INTEGER)` succeeds and returns NULL.

## TRY_CAST vs. TRY(CAST)

TRY can wrap any expression, so one can wrap CAST as well:

```
  TRY(CAST('foo' AS INTEGER))
```

Wrapping CAST in TRY is similar to TRY_CAST, but not equivalent. TRY_CAST
suppresses only cast errors, while TRY suppresses any error in the expression
tree.

For example, `CAST(1/0 AS VARCHAR)` fails:

```
  Division by zero
```

, `TRY_CAST(1/0 AS VARCHAR)` also fails:

```
  Division by zero
```

, but `TRY(CAST(1/0 AS VARCHAR))` succeeds and returns NULL.

In this case, the error is generated by division operation (1/0). TRY_CAST
cannot suppress that error, but TRY can. More generally, TRY(CAST(...))
suppresses all errors in all expressions that are evaluated to produce
an input for CAST as well as errors in CAST itself, but TRY_CAST suppresses
errors in CAST only.

## What happens when many rows fail?

In most cases only a fraction of rows generates an error. However, there are
queries where a large percentage of rows fail. In these cases, a lot of CPU
time goes into handling exceptions.

For example, one Prestissimo query used 3 weeks of CPU time, 93% of
which was spent processing try(date_parse(...)) expressions where most rows
failed. Here is a profile for that query that shows that all the time went into
stack unwinding:

<figure>
 <img src="/img/optimize-try-more-profile1.png"/>
</figure>

This query processes 14B rows, ~70% of which fail in date_parse(...) function
due to the date string being empty.

```
    presto> select try(date_parse('', '%Y-%m-%d'));
     _col0
    -------
     NULL
    (1 row)


    – TRY suppressed Invalid format: "" error and produced a NULL.
```

Velox tracks the number of suppressed exceptions per operator / plan node and
reports these as numSilentThrow runtime stat. For this query, Velox reported
21B throws for a single FilterProject node that processed 14B rows. Before the
optimizations, each failing row used to throw twice. An <a href="https://velox-lib.io/blog/optimize-try_cast.">earlier blog post</a>
from <a href="https://www.linkedin.com/in/laith-sakka-629ab384/">Laith Sakka</a> explains why.
After the optimizations this query’s CPU time dropped
to 17h: 30x difference from the original cpu time. Compared to
Presto Java, this query uses 4x less cpu time (originally it used 6x more).

<figure>
 <img src="/img/optimize-try-more-numSilentThrow.png"/>
</figure>

We observed similar issues with queries that use other functions that parse
strings as well as casts from strings.

## Solution

To avoid the performance penalty of throwing exceptions we need to report errors
differently. Google’s <a href="https://abseil.io/docs/cpp/guides/status">Abseil</a> library uses absl::Status to return errors from
void functions and absl::StatusOr to return value or error from non-void
functions. <a href="https://arrow.apache.org/cookbook/cpp/basic.html#id8">Arrow</a> library
has similar Status and Result. Our own Folly has <a href="https://github.com/facebook/folly/blob/main/folly/Expected.h">folly::Expected</a>.
Inspired by these examples we introduced <a href="https://github.com/facebookincubator/velox/pull/8084">velox::Status</a> and <a href="https://github.com/facebookincubator/velox/pull/9858">velox::Expected</a>.

velox::Status holds a generic error code and an error message.

`velox::Expected<T>` is a typedef for `folly::Expected<T, velox::Status>`.

For example, a non-throwing modulo operation can be implemented like this:

```
  Expected<int> mod(int a, int b) {
    if (b == 0) {
      return folly::makeUnexpected(Status::UserError(“Division by zero”));
    }

    return a % b;
  }
```

## Non-throwing Simple Functions

We extended the Simple Function API to allow authoring non-throwing scalar
functions. The function author can now define a ‘call’ method that returns
Status. Such a function can indicate an error by returning a non-OK status.

```
  Status call(result&, arg1, arg2,..)
```

These functions are still allowed to throw and exceptions will be handled
properly, but not throwing improves performance of expressions that use TRY.

Modulo SQL function would look like this:

```
    template <typename TExec>
    struct NoThrowModFunction {
      VELOX_DEFINE_FUNCTION_TYPES(TExec);

      Status call(int64_t& result, const int64_t& a, const int64_t& b) {
        if (b == 0) {
          return Status::UserError("Division by zero");
        }

        result = a % b;
        return Status::OK();
      }
    };
```

We changed date_parse, parse_datetime, and from_iso8601_date Presto functions
to use the new API and report errors without throwing.

## Non-throwing Vector functions

Vector functions can implement non-throwing behavior by leveraging the new
EvalCtx::setStatus(row, status) API. However, nowadays we expect virtually all
functions to be written using Simple Function API.

## Non-throwing CAST

CAST is complex. A single name refers to multiple dozen individual operations.
The full matrix of <a href="https://facebookincubator.github.io/velox/functions/presto/conversion.html#supported-conversions">supported conversions</a> is available in the Velox
documentation. Not all casts throw. For example, cast from an integer to a
string does not throw. However, casts from strings may fail in multiple ways. A
common failure scenario is cast from an empty string. Laith Sakka optimized
this use case earlier.

```
> select cast('' as integer);
Cannot cast '' to INT
```

However, we are also seeing failures in casting non-empty strings and NaN floating point values to integers.

```
> select cast(nan() as bigint);
Unable to cast NaN to bigint

> select cast('123x' as integer);
Cannot cast '123x' to INT
```

CAST from string to integer and floating point value is implemented using
`folly::to` template. Luckily there is a non-throwing version: `folly::tryTo`.
We changed our CAST implementation to use `folly::tryTo` to avoid throwing.
Not throwing helped improve performance of TRY_CAST by 20x.

Still, the profile showed that there is room for further improvement.

## Do not produce or store error messages under TRY

After switching to non-throwing implementation, the profile showed that half the
cpu time went into folly::makeConversionError. folly::tryTo returns result or
ConversionCode enum. CAST uses folly::makeConversionError to convert
ConversionCode into a user-friendly error message. This involves allocating and
populating a string for the error message, copying it into the std::range_error
object, then copying it again into Status. This error message is very helpful
if it is being propagated all the way to the user, but it is not needed if the
error is suppressed via TRY or TRY_CAST.

<figure>
 <img src="/img/optimize-try-more-profile2.png"/>
</figure>

To solve this problem we introduced a thread-local flag, threadSkipErrorDetails,
that indicates whether Status needs to include a detailed error message or not.
By default, this flag is ‘false’, but TRY and TRY_CAST set it to ‘true’. CAST
logic checks this flag to decide whether to call folly::makeConversionError or
not. This change gives a 3x performance boost to TRY_CAST and 2x
to TRY.


```
    if (threadSkipErrorDetails()) {
      return folly::makeUnexpected(Status::UserError());
    }

    return folly::makeUnexpected(Status::UserError(
        "{}", folly::makeConversionError(result.error(), "").what()));
```


After this optimization, we observed that TRY(CAST(...)) is up to 5x slower than
TRY_CAST when many rows fail.

The profile revealed that 30% of cpu time went to
EvalCtx::ensureErrorsVectorSize. For every row that fails, we call
EvalCtx::ensureErrorsVectorSize to resize the error vector to accommodate that
row. When many rows fail we end up resizing a lot: resize(1), resize(2),
resize(3),...resize(n). We fixed this by pre-allocating the error vector in the TRY
expression.

Another 30% of cpu time went into managing reference counts for
`std::shared_ptr<std::exception_ptr>` stored in the errors vector. We do not need
error details for TRY, hence, no need to store these values. We fixed this by
making error values in error vector optional and updating EvalCtx::setStatus to
skip writing these under TRY.

<figure>
 <img src="/img/optimize-try-more-profile3.png"/>
</figure>

After all these optimizations, the microbenchmark that measures performance of
casting invalid strings into integers showed 100x improvement. The benchmark
evaluates 4 expressions:

* TRY_CAST(‘’ AS INTEGER)
* TRY(CAST(‘’ AS INTEGER))
* TRY_CAST(‘$’ AS INTEGER)
* TRY(CAST(‘$’ AS INTEGER))

When we started, the benchmark results were:

```
===============================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
================================================================
cast##try_cast_invalid_empty_input                          2.40ms    417.47
cast##tryexpr_cast_invalid_empty_input                    402.63ms      2.48
cast##try_cast_invalid_nan                                392.14ms      2.55
cast##tryexpr_cast_invalid_nan                            827.09ms      1.21
```

At the end the numbers improved 100x:

```
cast##try_cast_invalid_empty_input                          2.16ms    463.62
cast##tryexpr_cast_invalid_empty_input                      4.29ms    232.95
cast##try_cast_invalid_nan                                  5.47ms    182.83
cast##tryexpr_cast_invalid_nan                              7.76ms    128.81
```

Note: The performance of TRY_CAST(‘’ AS INTEGER) hasn’t changed because this
particular use case has been optimized by Laith Sakka earlier.

## Next steps

We can identify queries with a high percentage of numSilentThrow rows and
change throwing functions to not throw.

For simple functions this involves changing the ‘call’ method to return Status
and replacing ‘throw’ statements with return Status::UserError(...). You get
extra points for producing error messages conditionally based on thread-local
flag threadSkipErrorDetails().

```
template <typename TExec>
struct NoThrowModFunction {
  VELOX_DEFINE_FUNCTION_TYPES(TExec);

  Status call(int64_t& result, const int64_t& a, const int64_t& b) {
    if (b == 0) {
      If (threadSkipErrorDetails()) {
          return Status::UserError();
      }
      return Status::UserError("Division by zero");
    }

    result = a % b;
    return Status::OK();
  }
};
```

We are changing CAST(varchar AS date) to not throw.

We provided a non-throwing ‘call’ API for simple functions that never return a
NULL for a non-NULL input. This covers the majority of Presto functions. For
completeness, we would want to provide non-throwing ‘call’ APIs for all other
use cases:

- bool call() for returning NULL sometimes
- callAscii for processing all-ASCII inputs
- callNullable for processing possibly NULL inputs
- callNullFree for processing complex inputs with all NULLs removed.

## Acknowledgements

Thank you <a href="https://www.linkedin.com/in/laith-sakka-629ab384/">Laith Sakka</a>
for doing the initial work to investigate and optimize TRY_CAST for empty strings
and sharing your findings in a blog post.

Thank you <a href="https://www.linkedin.com/in/orrierling">Orri Erling</a> for
adding numSilentThrow runtime stat to report number of suppressed exceptions.

Thank you <a href="https://www.linkedin.com/in/pedro-pedreira/">Pedro Eugenio Rocha Pedreira</a>
for introducing the velox::Status class.

Thank you <a href="https://www.linkedin.com/in/bikramjeet-vig/">Bikramjeet Vig</a>,
<a href="https://www.linkedin.com/in/gongchuo-lu-23bb251b/">Jimmy Lu</a>,
<a href="https://www.linkedin.com/in/orrierling">Orri Erling</a>,
<a href="https://www.linkedin.com/in/pedro-pedreira/">Pedro Eugenio Rocha Pedreira</a> and
<a href="https://www.linkedin.com/in/xiaoxuanmeng">Xiaoxuan Meng</a> for brainstorming and
helping with code reviews.